Entity resolution on-demand

نویسندگان

چکیده

Entity Resolution (ER) aims to identify and merge records that refer the same real-world entity. ER is typically employed as an expensive cleaning step on entire data before consuming it. Yet, determining which entities are useful once cleaned depends solely user's application, may need only a fraction of them. For instance, when dealing with Web data, we would like be able filter interest gathered from multiple sources without entire, continuously-growing data. Similarly, querying lakes, want transform on-demand return results in timely manner---a fundamental requirement ELT ( Extract-Load-Transform ) pipelines. We propose BrewER , framework evaluate SQL SP queries dirty while progressively returning if they were issued tries focus effort one entity at time, following ORDER BY predicate. Thus, it inherently supports top-k stop-and-resume execution. wide range applications, significant amount resources can saved. exhaustively show efficacy four datasets.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Entity Resolution on Complex Network

Complex networks can be used to describe the Internet, social network, or more broadly describe a binary relation of a set of objects. Structure information of complex network helps the identification of the entity corresponding to nodes in the network. There is much research in this area, and the authors introduce these studies and their results in this chapter. The authors mainly present two ...

متن کامل

The Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution

This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...

متن کامل

Adaptive Temporal Entity Resolution on Dynamic Databases

Entity resolution is the process of matching records that refer to the same entities from one or several databases in situations where the records to be matched do not include unique entity identifiers. Matching therefore has to rely upon partially identifying information, such as names and addresses. Traditionally, entity resolution has been applied in batch-mode and on static databases. Howev...

متن کامل

On Link Validity and Entity Resolution

The Entity Resolution problem has been widely addressed in the literature. In its simplest version, the problem takes as input a knowledge base composed of records describing real world entities and outputs the sets of records judged to correspond to the same real world entity. More elaborated versions take into account links amongst records representing relationships between the entities which...

متن کامل

On Entity Resolution for Probabilistic Data

Entity resolution (ER) is the problem of identifying duplicate tuples, which are the tuples that represent the same real-world entity. There are many real-life applications in which the ER problem arises. These applications range from news aggregation websites, identifying the news that cover the same story, in order to avoid presenting one story several times to the user, to the integration of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the VLDB Endowment

سال: 2022

ISSN: ['2150-8097']

DOI: https://doi.org/10.14778/3523210.3523226